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ABSTRACT 

Using Twitter during academic conferences is a way of engaging 
and connecting an audience inherently multicultural by the nature 
of scientific collaboration. English is expected to be the lingua 
franca bridging the communication and integration between native 
speakers of different mother tongues. However, little research has 
been done to support this assumption. In this paper we analyzed 
how integrated language communities are by analyzing the schol¬ 
ars’ tweets used in 26 Computer Science conferences over a time 
span of five years. We found that although English is the most pop¬ 
ular language used to tweet during conferences, a significant pro¬ 
portion of people also tweet in other languages. In addition, people 
who tweet solely in English interact mostly within the same group 
(English monolinguals), while people who speak other languages 
tend to show a more diverse interaction with other lingua groups. 
Finally, we also found that the people who interact with other Twit¬ 
ter users show a more diverse language distribution, while people 
who do not interact mostly post tweets in a single language. These 
results suggest a relation between the number of languages a user 
speaks, which can affect the interaction dynamics of online com¬ 
munities. 

Categories and Subject Descriptors 

J.4 [Social and Behavioral Sciences]: Sociology 
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1. INTRODUCTION 

In the past few years, Twitter has been used as a conference 
backchannel platform in academic events targeting the expansion 
of the community communication and participation (T]|10|. Atten¬ 
dees using Twitter are generally involved in note taking, sharing re¬ 
sources and reporting individual real-time reactions to events, cov¬ 
ering both conference presentations and conference social activi¬ 
ties. This supports scholars’ activities such as disseminating their 
work and engaging general public and newcomer scientists into the 

Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are not 
made or distributed for profit or commercial advantage and that copies bear 
this notice and the full citation on the first page. Copyrights for components 
of this work owned by others than ACM must be honored. Abstracting with 
credit is permitted. To copy otherwise, or republish, to post on servers or to 
redistribute to lists, requires prior specific permission and/or a fee. Request 
permissions from permissions@acm.org. 
f/7'2015 Cyprus, Turkey 

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. 


research communities |8). It is a common practice in research con¬ 
ferences to use hashtags in the tweets to identify that particular 
event (e.g. #hypertext2015). International academic conferences 
have a diverse community, with different cultural backgrounds and 
languages. Thus, it is interesting to analyze how language affects 
the generation of content and interaction among attendees. Such 
study would allow to observe how integrated a research commu¬ 
nity is, as well as to identify its blind spots in communication. This 
can be of special interest to conference organizers not only to evalu¬ 
ate communication but also to have an overview of their audiences. 
Despite the research done in the past |12| on academic 

conferences, little has been done on language communities and the 
communication established among them. To bridge this gap, we 
explore the language of 7M tweets posted by 18K users during 26 
Computer Science conferences over five years (one week before 
and after for each conference). We group users by the language(s) 
they use to tweet in order to explore how different language com¬ 
munities interact. Although English is expected to be the lingua 
franca of many international events, we wonder to what extent peo¬ 
ple use other languages on Twitter during academic conferences. 

Research Questions. Overall, our study was driven by the fol¬ 
lowing research questions; 

• RQl. Conference attendees’ languages; To what extent do 
people tweet in other languages beyond English in confer¬ 
ences? 

• RQ2. Interactions between lingua groups: How do lingua 
groups interact with each other? 

• RQ3. Effect of language: Is there an effect of language or 
lingua group over online user interaction? 

Main results. We find that most people tweet only in English 
(61%) in conferences but most of the tweets are posted by multilin¬ 
gual users and their participation varies significantly across confer¬ 
ences. 

Additionally, we observe that English monolinguals receive most 
of the attention and interact more within their group while the op¬ 
posite is observed with most of the members from other language 
communities. Finally, we show that people who do not interact 
other attendees are mostly monolinguals, while people who inter¬ 
act with others present more language diversity, by a balanced dis¬ 
tribution of monolinguals and multilinguals. 

2. DATASET 

We selected a representative set of conferences in Computer and 
Information Science from the CORE Computer Science Confer¬ 
ence Ranking lis0 26 conferences active in Twitter every year 
between 2009 and 2013. Furthermore, we manually checked that 
the selected conferences did not overlap with other events. To re- 

‘http://www.core.edu.au/index.php/conference-rankings 
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MobileHCI 
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NIPS 
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SIGGRAPH 

77% 

16% 

7% 
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24% 
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SIGIR 

68% 

21% 

12% 

56% 

58% 

36% 

39% 

SIGMOD 

72% 

23% 

6% 

58% 
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19% 

12% 

SLE 

59% 

32% 

9% 

58% 

58% 

40% 

40% 

UBICOMP 

71% 

21% 

9% 

59% 

57% 

55% 

44% 

UIST 

71% 

24% 

5% 

60% 

58% 

35% 

32% 

VLDB 

67% 

26% 

7% 

56% 

53% 

29% 

21% 

WSDM 

65% 

22% 

13% 

61% 

60% 

48% 

39% 

WWW 

52% 

32% 

15% 

52% 

51% 

43% 

40% 

XP 

58% 

35% 

7% 

53% 

52% 

51% 

54% 


Table 1: Percentage of monolinguals, bilinguals and multilinguals 
tweeting in each conference between 2009-2013 (col 2-4). Diver¬ 
sity percentage for different type of interactions (col 5-8) . 


trieve the tweets from these events in previous years, we used the 
Topsy API and crawled tweets containing the corresponding of¬ 
ficial hashtag (e.g., #chil2, #www2009) within a two-week time 
window around the dates each conference took place (from seven 
days before and until seven days after the conference ended). We 
found that these tweets were posted hy 22,021 participants in total. 
We acknowledge that these participants also interact with others 
without the conference hashtag and because of this we also crawled 
their timeline tweets during the same period. In total, we obtained 
6,993,693 tweets. 

Language Identification. To identify the language of the tweets, 
we removed all URLs, mentions and hashtags. Then we set a mini¬ 
mum threshold of 4 remaining words in the tweets to identify their 
language. The language detection task was performed with a pro¬ 
fessional language tool provided by Yahoo Labs Barcelona that is 
able to identify over 40-1- languages as in j^. Following this pro¬ 
cess we were left with 6,184,775 tweets (88% from initial sample) 
with an identified language. Finally, we proceeded to model each 
user by the three most frequent languages they used to tweet (set¬ 
ting a minimum threshold of 5 tweets per language). Consequently, 
we found 266 lingua groups with 18,347 users using at least three 
different languages in their tweets. 

3. RESULTS 

RQl. To what extent do people tweet in other languages be¬ 
yond English across conferences? 

As expected, we found that the majority of tweets are written 
in English (76%). Nevertheless, due to the multicultural nature of 
conferences, there is a non-negligible 24% of tweets in languages 
different than English (en), such as Erench (fr), Spanish (es), Ger¬ 
man (de) and Japanese (jp). Eurthermore, we found in our dataset 
that many people post tweets in more than a single language. 

We quantify this observation in Table [T] that shows the percent¬ 
age of users who tweet in a single language (1-lingua), in two Ian- 


Lingua 

Users 

Tweets 

(tweets/user) 

en 

61.31% 

29.14% 

179.50 

en-fr 

6.46% 

3.57% 

208.79 

en-es 

3.79% 

2.39% 

238.14 

de-en 

2.18% 

1.63% 

281.89 

en-nl 

2.15% 

1.50% 

263.54 

fr 

2.00% 

0.26% 

49.05 

en-ja 

1.92% 

3.55% 

696.92 

en-es-pt 

1.62% 

4.06% 

944.93 

en-pt 

1.44% 

0.35% 

92.65 

en-it 

1.36% 

1.56% 

434.83 

nl 

1.36% 

0.16% 

43.33 

ja 

1.09% 

1.09% 

377.89 

en-es-fr 

0.93% 

8.89% 

3609.91 

ca-en-es 

0.79% 

2.14% 

1016.69 

en-ko 

0.57% 

0.51% 

340.24 

es 

0.52% 

0.06% 

42.92 

Others 

10.52% 

39.14% 
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Table 2: Statistics of top lingua groups (more than 90 users). We 
show the percentage of users belonging to each lingua (Users), the 
percentage of tweets (Tweets) and the engagement (tweets/user). 
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Mentions 

(148,184) 

Retweets 

(91,523) 

Ling. 

Att. 

out-links 

Ling. 

Att. 

out-links 

en 

67% 

37% 

en 

66% 

37% 

en-fr 

7% 

56% 

en-fr 

7% 

54% 

de-en 

3% 

74% 

de-en 

3% 

78% 

en-es 

3% 

79% 

en-es 

3% 

80% 

en-ja 

2% 

35% 

en-ja 

2% 

42% 

Reciprocated 

Mentions 

(25,956) 

Retweets 

(6,496) 

Ling. 

Att. 

out-links 

Ling. 

Att. 

out-links 

en 

57% 

48% 

en 

51% 

52% 

en-fr 

8% 

52% 

en-fr 

8% 

44% 

de-en 

4% 

72% 

en-es 

5% 

61% 

en-es 

4% 

71% 

de-en 

4% 

74% 

en-nl 

3% 

71% 

en-nl 

3% 

70% 


Table 3: Most popular linguas: lingua groups ordered by the atten¬ 
tion they receive across all conferences. The out-link column rep¬ 
resents the percentage of interactions going to other lingua groups. 

guages (2-lingua) or three or more (> 3-lingua) in each confer¬ 
ence. We observe that the percentage of people who tweet in two 
or more languages goes from close to 20% (AAAI, SIGGRAPH) 
up to around 50% (ACMM, ICMT, WWW) showing important dif¬ 
ferences among conferences in the distribution of users who tweet 
in one or more languages. Based on these results, rather than ana¬ 
lyzing languages as isolated groups, we studied the lingua groups 
as communities of people who speak either one or more languages. 
Table|^describes the top language communities by number of users. 
The table shows that the majority of users are classified as English 
monolinguals (61%) but interestingly only produce (29%) of all 
tweets with a moderate engagement (only 179.5 tweets per user). 
In contrast, we see that users of multilingual groups are the most 
engaged (3609.9 tweets/user for en-es-fr, 1016.7 for ca-en-es, and 
944.93 for en-es-pt). 

These results lead us to further analyze specific lingua groups 
to unveil the interaction between language communities and their 
online behavior. 

RQ2. How do lingua groups interact with each other? 

To answer this question, we first define two types of interactions: 
(1) general interactions and (2) reciprocated interactions. We refer 
to general interactions to all retweets and also to tweets containing 
mentions, while reciprocated interactions correspond to recipro¬ 
cated retweets and tweets with mentions. 

Secondly, we measure diversity using the Gini-Simpson index. 






























































































(a) Mentions between lingua groups. An edge from lingua 
group X pointing to lingua group y shows proportions of men¬ 
tions that people in lingua group x directed to people in lingua 
group y. For readability, we only show probabilities > 0.05. 
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(b) Retweet interactions between top 50 most active lingua 
groups. 

Figure 1: (a) Nodes representing the top 10 lingua groups based on 
mentions, (b) Interactions between lingua groups based on source 
language (src) retweeting posts in a target language (dst). 


as in EH) who called it diversity index. This diversity index ranges 
from 0 to 1 and it measures the probability that two lingua groups 
taken at random from a set of interactions belong to different lin¬ 
gua groups. Participants of a conference with diversity index close 
to 0 will have the tendency to interact with people of the same lin¬ 
gua group. Conversely, conferences with values close to 1 show a 
uniform distribution of interactions with other lingua groups. We 
define diversity D of a lingua group as: 

O(c.0 = i-g(§f) (1) 

with Ni = where If j is the total number of inter¬ 

actions between people of lingua i and j. Ni is the total number 
of interactions of people of lingua i in conference c. In order to 
know the diversity of a conference, we average D{c, i) over all the 
linguas in conference c. 

We see in Table [T] the diversity for each conference (we repre¬ 
sented it as a percentage). We find some interesting patterns show¬ 
ing that a lower percentage of monolinguals is linked to higher di¬ 
versity. For example, ICMT is the most diverse conference for the 
general type of interactions and the percentage of monolinguals is 
the lowest of all (51%). Conversely, AAAI shows high percentage 
of monolinguals (82%) and the lowest diversity for the general in¬ 
teractions. On the other hand, reciprocal interactions do not show 
to be related to the percentage of monolinguals. For example, UBI- 


COMP presents a high percentage of monolinguals and the highest 
diversity for the reciprocal interactions. 

Furthermore, we look at the attention received by members of 
each lingua by calculating the number of mentions and retweets 
received from different users. Table shows the top 5 most pop¬ 
ular lingua groups. Without doubt, English monolinguals are the 
most mentioned and retweeted in the general and reciprocated in¬ 
teractions. Albeit the fact that English monolinguals do not pro¬ 
duce most of the tweets, they still receive most of the attention. 
This is mostly explained by the column out-links, which shows the 
percentage of mentions and retweets about dijferent lingua group. 
For example, we see that only 37% of the mentions and retweets 
generated by English monolinguals refer to other groups. Interest¬ 
ingly, Japanese bilinguals also prefer to interact mostly within their 
group. Conversely, groups like en-fr, de-en, en-es refer more users 
of different lingua groups in their interactions. 

More evidence of the unequal activity between lingua groups is 
seen in Figure[T] which considers only the top 10 lingua groups and 
shows (a) the mentions network (general type) and (b) the retweet 
network (general type) across lingua groups. Figure [T^ shows that 
79% of all mentions from the en group also belong to the same 
group. Moreover, 35% of mentions from the en-es lingua group 
refer to users from the same group, and 48% to the en group. 

In Figurep^ the Sankey plot represents the network of retweets. 
Again, here we see that for most of the cases the English group 
retweets members from the same group. At the same time, the 
English group receives most of the attention from other language 
communities. Interestingly, in similar proportion, lingua groups 
en-es-it, en-fr, en-es-pt and en-ja show a similar pattern, preferably 
retweeting users on their same lingua groups. 

RQ3. Is there any effect of language or lingua group over 
online user interaction? 

We addressed this question by studying how the number of lan¬ 
guages a Twitter user speaks affects her online behavior. As already 
explained, if a user has posted tweets in only one language we con¬ 
sider her in the 1-lingua group (monolingual), while another user 
tweeting in two languages will be in the 2-lingua group, and so 
on. We found two results that show at general and at individual 
level the effect of the amount of languages on user interaction. At 
the general level, we found that among the users who posted tweets 
but who had not interacted with other people (by mentioning them), 
the percentage on monolinguals is considerably larger (80.6%) than 
multilinguals. A different picture is seen among users who inter¬ 
acted at least once during the conference (by mentioning some¬ 
one in a tweet), since only 62.9% of those users are monolinguals 
and the rest are multilinguals. We conducted a chi-square test of 
proportions comparing the distribution of monolinguals, bilinguals 
and trilinguals between people who interacted and people who did 
not. We found a statistically significant difference with = 416.6, 
df = 2, pvalue < .001. This relation can be better observed in 
Figure]^ where the group who interacted (right-side plot) had a 
more balanced distribution and hence a higher entropy (a measure 
of diversity (11| ) of H (s) = 0.89 compared to a smaller diversity 
on lingua groups among people who did not interact with an en¬ 
tropy H{s) = 0.61. Moreover, at the individual level we found that 
the more the languages a user speaks, the larger the likelihood to in¬ 
teract with others. Table|^shows the results of a logistic regression 
where the dependent variable measures whether the user interacted 
with other people or not. The factors in the regression are the year 
of the conference and the number of languages the user has used to 
tweet (n_languages). We observe that the number of languages has 
a significant /3 coefficient of 0.666 (p < .001), which can be inter¬ 
preted by saying that, keeping all the other factors fixed, for each 
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Figure 2: Distribution of n-lingua groups considering users without 
(left graph) and with reply/mention interactions (right graph). 

additional language the user speaks the odds ratio of interacting in 
the network increases by 95% (since = 1.95). 


Variable 

coeff. 

S.E. 

year(=2009) 

2.049*** 

(0.390) 

year(=2010) 

2.458*** 

(0.385) 

year(=2011) 

2.453*** 

(0.385) 

year(=2012) 

2.294*** 

(0.383) 

year(=2013) 

2.423*** 

(0.383) 

n_languages 

0.666*** 

(0.035) 

Constant 

-1.371*** 

(0.385) 

Observations 

26,281 



Note: *p<0.1; **p<0.05; ***p<0.01 


Table 4: Results of L.R. where the D.V. is whether user interacted 
on Twitter (mentions) and the I.V.s are conference year and num¬ 
ber of languages spoken. 

4. RELATED WORK 

There are several studies on the role of Twitter in academic con¬ 
ferences. Letierce et al. m showed that Twitter is frequently 
used to spread information across researchers using the official con¬ 
ference hashtags. Wen et al. (H) studied conference participants 
and found that newcomer students receive little attention from se¬ 
nior members of the research community. In an extension of this 
work, Wen et al. ID expand their research by analyzing 16 con¬ 
ferences over five years, identifying factors that contribute to the 
continuing participation of users to the online Twitter conference 
activity. We have continued this line of research by exploring the 
influence of language during conferences.The role of language in 
Twitter has also been studied. Hong et al. |4| studied differences 
in usage patterns between language communities in Twitter, while 
Kim et al. j5) performed a sociolinguistic study on the role of 
mono- and bilinguals in Twitter across multilingual societies such 
as Qatar, Quebec and Switzerland. Inspired by them, we adopt sim¬ 
ilar methods to build language communities but we target different 
lingua groups interacting at conferences. 

A broader but certainly related topic of study is the impact of 
culture in online communication. Garcia et al. Q studied the 
most discriminative features influencing international active con¬ 
versation and attention in Twitter by mapping nationality to I.P ad¬ 
dresses (e-mails) or geolocated tweets. Language and nationality 
are two important cultural dimensions in people’s identities, but we 
find that focusing on language(s) we capture the multicultural na¬ 
ture of most researchers that attend international conferences. 

5. CONCLUSIONS & FUTURE WORK 

In this paper we show that the majority of users in Computer and 
Information Science conferences tweet only in English and most of 
the tweets are also posted in English. Nevertheless, our results in¬ 
dicate that members from other lingua communities produce most 
of the tweets and are more engaged than English monolinguals. 


A second observation is that although English is the lingua franca 
in academic conferences, apparently English monolinguals still pre¬ 
fer to interact more with themselves. The same happens for other 
important communities such as English-Japanese bilinguals. This 
is not the case for most of other important communities, who tend 
to interact more equally with members of other lingua. 

Our final finding is that there is more language diversity among 
people who interact with others on Twitter during conferences, com¬ 
pared to people who do not. This result suggests an important im¬ 
plication, which is that although English is the standard for scien¬ 
tific communication, the diversity in language use is a catalyst for 
interactions in a community. 

These findings leave us with several questions and encourage us 
to complement our work in several aspects. For example, which 
other aspects of people’s culture can influence the communication 
gap across lingua groups? Can we identify that a research com¬ 
munity requires more diversity by analyzing user interaction on 
Twitter? Can we identify user behavior related to specific lingua 
groups, such thaf we can differenliate English-Spanish bilinguals 
from English-German ones? 
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APPENDIX 

The following tables show detailed data used in our analyses. 



































